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TECHNICAL MEMORANDUM X-64707 . 

FAST DIGITAL NOISE FILTER CAPABLE OF LOCATING 
SPECTRAL PEAKS AND SHOULDERS 

INTRODUCTION 

Much valuable information from an experiment may be lost because of 
a poor signal-to- noise ratio. The source of this unwanted noise is. usually a 
combination of ground loops, transient currents, reading errors, using equip- 
ment near the limits of its range, etb. In most experiments, one can assume, 
that the noise is a random event distributed normally about the true signal. . 
This is a Gaussian- shaped distribution, with most of the deviations caused by 
noise occurring within one standard deviation of the signal. It is normally 
assumed that the standard deviation of the noise is independent of the signal 
value and is also constant throughout the experiment. 

Under such assumptions, it is possible to utilize several techniques 
for enhancing the signal-to-noise ratio. An example is the simple RC filter 
used to remove the high-frequency noise component from an analog signal. 

In recent years, however, more and more experimentalists have turned to 
computers for analyzing results, and somewhere in the course of the experi- 
ment it is necessary to digitize the stream of analog data. Unfortunately, 
the noise present in the analog signal carries over into the digital signal. 

Thus, the need arises for a digital filter which is capable of enhancing 
the digitized signal. Several of these filters exist [ 1-4] , and this report will 
describe one based upon the principle of least squares. The idea behind this 
filter was presented by Savitsky and Golay [ 5] and has been expanded and 
programmed by the authors to handle spectral data. This type filter has been 
found advantageous because of its high speed and because it not only ’’smooths” 
the data, but simultaneously finds the peaks and shoulders present in the 
spectrum. The following sections will describe the principles of the filter and 
its implementation in spectral analysis. 



THE THEORY OF CONVOLVING INTEGERS 


The object of this digital filter is to removb-as much noise as possible 
without degrading the signal. The filter operates by modifying a given point 
to be some function of itself and nearby points. An RC filter can use only 
past information, and this introduces a unidirectional distortion into the data; 
i. e. , phase shift. A digital filter, however, can take advantage of the stored 
array of data to utilize both past and future points, thus providing a better 
smooth than an analog filter. 

Two assumptions, concerning the data must be met if the filter is to be 
effective: (l) the data must occur at equally spaced intervals along the 
abscissa, and (2) the curve formed by the data in the filter must be reasonably 
smooth t 5] . The first of, these assumptions is always met in computer work 
because the abscissa is actually the time interval at which the data are sampled, 
and this is equal interval and stable to 0. 01-percent accuracy or better. This 
report will deal primarily with the filtering of the intensity since it is the 
signal of main interest. 

th 

A simple digital filter is a moving weighted average. The j point 
modified by the 2 m+1 points of which it is the center: 


m 

£ Vi + i /N • (u 

i = -m 

The coefficients c^ are integers which are chosen to give the desired 

weighting, and N is the normalization factor (in this case, N is equal to 
2 m+l). By allowing j to run through the index of the array, the data are 
smoothed using only simple arithmetic operations. It is clear that m can 
be set to any value, giving a 2 m+l point filter. 

The c. coefficients are called convoluting integers for the following 

reason. The filter can be considered an operator which forms the smoothed 
data y*(t) by integrating the raw data y(t) with a weighting function w(t) . 
Since the weighting at a point depends upon the time difference between the 
weighted point and the point being smoothed, the filtering operation can be 
written [ 6] 
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( 2 ) 


y*(t) = /w(r) y(t - r) dr . 

This integral is defined as the convolution of y(t) with w(t)i In a digital 
filter the weighting function is of the form 


m 

W ( T ) = ^ c. 6(r + i)/N (3) 

i = -m 


where 6(t) is the Dirac-delta function representing the discrete sampling 
of the data. Using this in equation (2) at time t = j gives 


■o> = f 


m 


E c - 5 (t + i) /N 


1 = -m 


y(j - t) dr 


m 

= E c.y(j + i)/N . 


1 = -m 



By equating y(j + i ) with y. this is 

j + x 

Thus, the c^'s are known as convoluting 


seen to be identical to equation (l). 
integers. 


While the moving weighted average works well for a quasi-dc signal, 
it tends to drastically distort a curve of large curvature, such as a peak. 

Thus, one would like to retain the simplicity of equation (l) but find a set of 
convoluting integers which does not alter the shape of the data. The most 
commonly used method for smoothly fitting a curve to a group of data points 

til 

is the method of least squares; therefore; we are led to try fitting an n degree 
polynomial to 2 m+1 points such that the sum of the squares of the residuals 
is minimized. Our goal is to express the polynomial value at the center point 
in terms of the 2 m+1 unsmoothed points, and we will then regard this as the 
smoothed value of the center point. This method will be seen to provide not 
only a set of convoluting integers for smoothing, but also sets for finding the 
first n derivatives. 
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The problem is to fit a polynomial to 2 m+1 points and then replace 
the center point by its polynomial, or smoothed, value. Since the data points 
are assumed to be uniformly spaced and odd in number, they can be normalized 
and centralized to be integer values centered at zero [ 7] ; i. e. , t. = i . The 
th ! • • . .1 

n degree polynomial is then of the form 


f. 

i 


n 

Z ' 

k = 0 


.k 

i 


= b + b<i+ . . . 
o 1 


b i n 
n 


( 5 ) 


But we only require the value at i = 0 , so clearly y* = f^ = b Q . We find that 

the smoothed value at a point is merely the first coefficient of the best-fit 
polynomial centered at the point. By further taking derivatives of the poly- 
nomial, one can show 


y* = f = b 
o o 


dy* di 
di dt 



At di At 


dy* 

dt 


( 6 ) 



1 , 

(At) S 


d S f s' b 
- ■ o _ s, 

di s (At) s 




where At is the constant step size between: abscissa points; i. e. * the analog- 
to-digital conversion time. Thus, the smoothed value at the first n deriva- 
tives at the center point can be found by solving for the proper regression 
coefficients. ' ■ •. ■ , . ■ 
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The solution for the regression coefficients b can be found in 

s 

numerous texts on regression analysis and will not be given here.' One can 
show that the solution is of the form 


si b 

s 


m 

S c i y i /N 


= -m 


( 7 ) 


This is identical to equation (l), with j = 0 , since the points here are 
centralized. Thus, each b and, hence, each derivative can be evaluated 

S ' 

by a set of c^ of convoluting integers. These integers depend on the order 

of derivatives (0 to n), the number of points (2 m+ 1 ), and the order of the 
polynomial (n < 2 m+l). Large tables of these convoluting integers with 
their corresponding normalization factors can be found in Reference 5. 


It is also possible to convolve two of these sets of integers together, 
resulting in a single set of integers which performs the operations of the two 
original filters simultaneously. This is decidedly advantageous for program- 

ming. Thus, if we convolve a 2 m+l point n order smooth with a 2 p+1 

point k ^ 1 order s^ derivative, we obtain 


,s * 
dy o 


m 


dt 


s - 


N g (Atj i f -P 


i, s ( N . ^_ C j,o y i+j 


o 3 = -m 


N N (At) i = -p j = -m 
os 


£ y 

> c i, s c j,o y i +j ■ 


( 8 ) 


_1 ^ 

N Li' 

h = - (m+p) 
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where 




S; 


c. c 

I.s 3,0 


i+j = h 


N = N N (At) k 
os 


th 

The. c. are the convoluting integers of the k -order derivatives, the 
* ^ th 

c-. are those of the n order smooth, and the d, are those for the 

J,° n 

combined operation. The resulting filter has 2 (m+p)+l points. 


APPLICATION TO SPECTRAL ANALYSIS 

/ . .,•■••••• • • 

This technique is easily adapted to a program which smooths data and 
also finds peaks and shoulders. The resulting filter can accept roughly 2500 
points per second on a computer such as the Uni vac 1108. Figure 1 is a flow 
chart of the filter," and this section will discuss the various operations. 


In addition to smoothing spectral data, one wishes to know the locations 
and intensities of the spectral peaks. When a small peak occurs very close to 
a larger peak, the smaller one appears not as a separate peak but as a 
shoulder; i. e. , an unresolved peak. The problem of determining if shoulders 
are present and, if so, their precise location is a difficult task. However, 
utilization of the above approach has resulted in an accurate algorithm for 
finding peaks and shoulders. 


These operations have been included by the authors in the form of a 
computer subroutine. Input arguments are the x and y data points and a 
cutoff level (usually ~ 0. 1 percent of the maximum y) below which peaks 
are not considered. The smoothed data and the locations and intensities of 
all peaks and shoulders are returned to the main program. A description 
of the flowchart is given in the following paragraphs. 
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Figure 1. (Concluded) 







Smoothing 

A temporary 11-point array YS of smoothed points is used to ensure 
that only raw data are in the smoothing filter while only smoothed data are in 
the derivative filters. To get the filter started, the first 10 points of YS are 
initialized with a 3-point linear average of the corresponding points of Y. 

The main program loop does a 9-point cubic smooth of Y. and stores it in 

th 1 

the 11 point of YS: 


YS n - 


- (- 21 Y i- 


, + 14 Y. + 39 Y. • + 54 Y. , ' + 59 Y 
4 i-3 i-2 i-1 i 


+ 54 Y. + 39 Y. n + 14 Y. 0 - 21 Y.' \ /231 (9) 

l+l 1+2 1+3 1+4 1 7 \ / 


Note the symmetry of the coefficients about Y^, weighting past and future data 

equally. If YS 6 , the center point YS array, is now greater than the cutoff 
level, the YS array proceeds to the next section. If not, then YSj is dropped 
off as a smoothed point and stored in place of the corresponding value of Y, 
transforming the raw data vector into a smoothed vector. The filter then 
moves forward a point by shifting YS. to YS. and calculating a new value 

of YS n at the next point of Y. 1 1 . * 

Peak Finding 


To use only smoothed data in the calculations, all derivatives are 
found at YS e , the center of the YS array. The first derivative, DERI, is 
found with a 5-point cubic fit: 


DERI = YS 4 - 8 YS 5 + 8 YS 7 - YS 8 . 


( 10 ) 


The normalization factor is not used since absolute magnitudes are not needed 
for finding peak locations. This is because a peak is a relative maximum of 
the intensity, and, thus, occurs at a zero crossing; i.e. , where the first 
derivative changes sign. To distinguish peaks from valleys, one must also 
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require the change to be from positive to negative. Thus, if DERI and its 
product with the derivative at the last point are negative, a peak has occurred 
between the points and may be located by linear interpolation. To prevent 
any remaining noise from appearing as a peak, the program requires the 
intensity to have risen for at least four consecutive points preceding a peak. 
Even the smallest of true peaks should easily exceed this if the digital 
convertor sampling rate is adequate for the analog frequencies in question. 
Following the check for a peak, the YS array moves to the next section. 

Shoulder Finding 

The second derivative is now found at the same point as the first 
derivative. Since the taking of higher-order derivatives greatly enhances 
the noise level, compensation is obtained by further convolving the derivatives 
with a linear smooth. Thus, to find the second derivative, DER2, a 5-point 
linear smooth is combined with a 7-point cubic second derivative: 


DER2 = 5YS t + 5YS 2 + 2 YS 3 - 2 YS 4 - 5 YS 5 - 10 YS e 

- 5 YS 7 - .2 YS g +• 2 YS 9 + 5 YS 10 + 5 YS n . (ll) 

This has the effect of smoothing the data twice without altering the output YS 
of smoothed data. A shoulder represents a zero crossing in the second 
derivative, which can be distinguished from other inflection points because 
the product of the first and third derivatives is positive at shoulders and 
negative at other zero crossings. Thus, the second derivative is checked for 
zero crossings in a manner similar to that used for finding peaks, and, if 
one is found, a 5-point cubic third derivative combined with a 3-point linear 
smooth is calculated: 


DER3 = - YS 3 + YS 4 + YS 5 YS 7 - YS 8 + YS 9 . (12 ) 

The product of equation (12) with the first derivative then determines whether 
a shoulder has been found. The intensity at. this point is misleading because 
of the influence of the larger peak nearby. Practice has shown that 90 percent 
of the intensity is a reasonable guess as to the true peak height. With excep- 
tionally poor data, even the doubly convoluted smoothing is often not enough 
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to offset the sensitiveness of the shoulder finder, and erroneous shoulders 
may be located. To overcome this, it may be necessary to expand the number 
of points in the filter and perform. an additional linear smooth before finding 
the second derivative. 

Following the finding of a peak, a pre-set criterion is used to deter- 
mine if the peak is reasonably distant from the preceding peak or if it is so 
close that they should be combined into one peak. In case of shoulders, the 
location and intensity are generally in error from the true values by a few 
percent. This can be corrected by the use of an additional subroutine which 
performs multiparameter optimization. The authors are presently preparing 
a note on this subject. Having completed these operations, YS t is dropped 
as a smoothed point, the array is shifted forward a point, and the loop is 
started over by calculating a new YS lt . 


TEST CASE RESULTS 

Figure 2 shows the results of filtering a mass spectrum from a reside 
ual gas analyzer, the authors* main application of the subroutine. The spec- 
trum in this case is synthetic, made by combining Gaussian peaks with the 
locations and intensities listed in the figure as Input. The values determined 
by the filter are listed under Zero Crossing. The peaks range from intensities 
of 0. 04 (not even visible) to 8. 0, a factor of 200. To make the spectrum more 
realistic, Gaussian-distributed random noise with a standard deviation of 0. 01 
was added. 

Had the mass spectrum been real, it would have been necessary to 
smooth the mass values. An 11-point linear smooth operates well for a 
quadrupole spectrometer with a linear sweep. For a cycloid spectrometer, 
swept by an RC decay, a linear smooth of the logarithms of the mass values 
followed by exponentiating the result gives satisfactory values. However, 
for the synthetic spectrum, the mass values are generated accurately and 
require no smoothing. Thus, only the intensity values, generated at 40 points 
per AMU, were passed through the filter. 

As Figure 2 will show, the filter correctly located the three peaks and 
three shoulders that were present. The curve passing through the black 
squares is the linear recombination of Gaussian waveforms using the deter- 
mined values of the peaks. It follows the signal quite well except between 
45. 6 and 46. 0. The problem here is that the peaks are so close that they 
overlap considerably, causing the signal at the 45. 5 peak to be greater than 
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Figure 2. Subroutine SM09 determination of peaks’ and shoulders’ synthetic spectrum. 


the true peak height. Also, at the 45. 8 peak (which, incidentally, would 
probably not even be found by eye), the estimation of 90 percent of the signal 
for the peak height was in error by 10 percent. The other four peaks are 
quite accurate, with an average mass error of 0. 03 AMU and an average 
intensity error of 0.01. 

Figure 3 shows the result of passing the smoothed synthetic spectrum 
through a subroutine which performs multiparameter optimization. The 
positions and intensities of the peaks and shoulders found by digital filters 
are used as initial parameters in the optimization. An iterative technique 
then modifies these parameters so that the rms deviation between the smoothed 
and the optimized signals is minimum. As the figure shows, the combination 
of the filter with an optimization algorithm results in extremely accurate 
values of the parameters. 

Figures 4 and 5 show the results of filtering data from a cosmic ray 
proportional counter associated with the High Energy Astronomy Observatory 
experiment and represent a good example of applying the filter to actual 
experimental results. The curve on the left of each figure is the raw data, 
and the one on the right is smoothed, optimized data. The curves, recorded 
on a multi-channel analyzer, were produced by radioactive decay of Fe-55. 

The right peak is the photo peak resulting from ion-pair production in the argon 
of the detector by 5. 9 KeV photons produced by K-capture in the source. There 
is also a probability, although considerably less, that the same photons will 
remove a K electron from the argon, causing the so-called escape peak on 
the left. The presence of the two peaks is clearly seen in Figures 4 and 5. 

The YMAX and XBAR columns are the peak intensities and locations after 
using optimization subroutine. Note the success of the filter in removing 
excessive noise. 


CONCLUSION 


Based on these and other results, the authors feel that this digital 
filter, founded on the concept of convolving integers, is a reliable technique. 

In addition to its fast- smoothing capability, it also possesses the ability to 
easily and quickly determine derivatives and, hence, find peaks and shoulders. 
This should prove to be of value in many forms of spectroscopy. 

It should be mentioned that the particular sets of convoluting integers 
used in the examples are not rigidly determined. They were chosen through 
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Figure 3. Subroutine SM09 optimization of peaks' and shoulders’ synthetic spectrum. 
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Figure 4. Proportional counter ddta. 
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Figure 5. Proportional counter data. 



experience because they appeared to work the best. Each individual experi- 
ment should try different combinations of the number of points and the order 
of the polynomial to decide which works best for the type of data involved. 

As would be expected, increasing the number of points or decreasing the 
polynomial order causes less "bending" in the fit and, hence, better smoothing, 
but when carried too far, it distorts the signal shape. The necessary compro- 
mise must be determined on the basis of its merits in each individual experi- 
ment. 


Further efforts are being made to use this technique in conjunction 
with Fast Fourier Transforms and other methods of data handling. Further- 
more, the use of this filter in combination with optimization techniques 
results in extremely accurate spectral values in which the effects of noise 
have been almost entirely eliminated. 
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